Skip to content

fix(tf): handle tensor model bias reshape in backend conversion#4957

Closed
Copilot wants to merge 5 commits into
develfrom
copilot/fix-4411
Closed

fix(tf): handle tensor model bias reshape in backend conversion#4957
Copilot wants to merge 5 commits into
develfrom
copilot/fix-4411

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Sep 3, 2025

This PR fixes a tensor model conversion bug that prevented PyTorch models with GeneralFitting types (dipole, polar) from being converted to TensorFlow format.

Problem

When converting PyTorch models with GeneralFitting types (dipole, polar) to TensorFlow using dp convert-backend, the conversion failed with:

ValueError: cannot reshape array of size 6 into shape (2,100)

This error occurred because the TensorFlow deserialization code attempted to reshape the out_bias tensor to match the bias_atom_e tensor shape, but these tensors have fundamentally different purposes and incompatible shapes in GeneralFitting models:

  • out_bias: Shape varies by fitting type (e.g., [1, ntypes, 3] for dipole) - represents the output bias
  • bias_atom_e: Shape [ntypes, embedding_width] (e.g., [2, 100]) - represents the internal fitting network bias

Solution

Modified the deserialization logic in deepmd/tf/model/model.py to:

  1. Categorize fitting types correctly: Distinguish between InvarFitting types (ener, dos, property) and GeneralFitting types (dipole, polar)
  2. Apply reshape logic selectively: Only apply the original reshape logic to InvarFitting types where out_bias and bias_atom_e have compatible shapes
  3. Skip reshape for GeneralFitting types: Avoid the incompatible reshape operation for dipole and polar models
  4. Preserve backward compatibility: Keep the original logic for energy models and other InvarFitting types

Fitting Types Handled

  • InvarFitting types (reshape applied): ener, energy, dos, property, direct_force, direct_force_ener
  • GeneralFitting types (reshape skipped): dipole, polar

Reproduction Steps

The issue can be reproduced with:

cd examples/water_tensor/dipole
dp --pt train dipole_input_torch.json --skip-neighbor-stat
dp --pt freeze -o dw_model.pth
dp convert-backend dw_model.pth dw_model.pb  # Previously failed, now works

cd ../polar
dp --pt train polar_input_torch.json --skip-neighbor-stat
dp --pt freeze -o pw_model.pth
dp convert-backend pw_model.pth pw_model.pb  # Also works now

Testing

  • ✅ Successfully converts PyTorch dipole models to TensorFlow
  • ✅ Successfully converts PyTorch polar models to TensorFlow
  • ✅ Converted models load and function correctly
  • ✅ Energy models continue to work (no regression)
  • ✅ All fitting types handled correctly based on their class hierarchy

Fixes #4411.


💬 Share your feedback on Copilot coding agent for the chance to win a $200 gift card! Click here to start the survey.

Co-authored-by: njzjz <9496702+njzjz@users.noreply.github.com>
Copilot AI changed the title [WIP] [BUG] Cannot convert tensor pt model to tf model fix(tf): handle dipole model bias reshape in backend conversion Sep 3, 2025
Copilot AI requested a review from njzjz September 3, 2025 15:41
Comment thread deepmd/tf/model/model.py
Handle all GeneralFitting types (dipole, polar) vs InvarFitting types (ener, dos, property) correctly in backend conversion bias reshaping logic.

Co-authored-by: njzjz <9496702+njzjz@users.noreply.github.com>
Copilot AI changed the title fix(tf): handle dipole model bias reshape in backend conversion fix(tf): handle tensor model bias reshape for all fitting types in backend conversion Sep 4, 2025
Copilot AI requested a review from njzjz September 4, 2025 07:15
@njzjz njzjz requested a review from Copilot September 4, 2025 07:19
@github-actions github-actions Bot added the Python label Sep 4, 2025
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes a backend conversion bug that prevented PyTorch models with GeneralFitting types (dipole, polar) from being converted to TensorFlow format. The issue occurred when the TensorFlow deserialization code attempted to reshape incompatible bias tensors.

  • Adds conditional logic to distinguish between InvarFitting and GeneralFitting types during tensor model deserialization
  • Applies the original reshape logic only to compatible InvarFitting types (ener, dos, property)
  • Skips the problematic reshape operation for GeneralFitting types (dipole, polar) where bias tensors have different purposes and incompatible shapes

Comment thread deepmd/tf/model/model.py Outdated
Comment thread deepmd/tf/model/model.py Outdated
Signed-off-by: Jinzhe Zeng <jinzhe.zeng@ustc.edu.cn>
@codecov
Copy link
Copy Markdown

codecov Bot commented Sep 4, 2025

Codecov Report

❌ Patch coverage is 80.00000% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 84.28%. Comparing base (68ea2aa) to head (95aef5d).

Files with missing lines Patch % Lines
deepmd/tf/model/model.py 80.00% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##            devel    #4957   +/-   ##
=======================================
  Coverage   84.28%   84.28%           
=======================================
  Files         705      705           
  Lines       69097    69099    +2     
  Branches     3572     3573    +1     
=======================================
+ Hits        58239    58241    +2     
  Misses       9717     9717           
  Partials     1141     1141           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@njzjz njzjz marked this pull request as ready for review September 4, 2025 09:00
@njzjz njzjz changed the title fix(tf): handle tensor model bias reshape for all fitting types in backend conversion fix(tf): handle tensor model bias reshape in backend conversion Sep 4, 2025
Signed-off-by: Jinzhe Zeng <jinzhe.zeng@ustc.edu.cn>
@njzjz njzjz marked this pull request as draft September 5, 2025 10:32
@njzjz
Copy link
Copy Markdown
Member

njzjz commented Sep 7, 2025

Close in favour of #4962.

@njzjz njzjz closed this Sep 7, 2025
@njzjz njzjz deleted the copilot/fix-4411 branch September 7, 2025 17:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Cannot convert tensor pt model to tf model

3 participants